## X fixed.acidity volatile.acidity citric.acid
## Min. : 1.0 Min. : 4.60 Min. :0.1200 Min. :0.000
## 1st Qu.: 400.5 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090
## Median : 800.0 Median : 7.90 Median :0.5200 Median :0.260
## Mean : 800.0 Mean : 8.32 Mean :0.5278 Mean :0.271
## 3rd Qu.:1199.5 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420
## Max. :1599.0 Max. :15.90 Max. :1.5800 Max. :1.000
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 0.900 Min. :0.01200 Min. : 1.00
## 1st Qu.: 1.900 1st Qu.:0.07000 1st Qu.: 7.00
## Median : 2.200 Median :0.07900 Median :14.00
## Mean : 2.539 Mean :0.08747 Mean :15.87
## 3rd Qu.: 2.600 3rd Qu.:0.09000 3rd Qu.:21.00
## Max. :15.500 Max. :0.61100 Max. :72.00
## total.sulfur.dioxide density pH sulphates
## Min. : 6.00 Min. :0.9901 Min. :2.740 Min. :0.3300
## 1st Qu.: 22.00 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500
## Median : 38.00 Median :0.9968 Median :3.310 Median :0.6200
## Mean : 46.47 Mean :0.9967 Mean :3.311 Mean :0.6581
## 3rd Qu.: 62.00 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300
## Max. :289.00 Max. :1.0037 Max. :4.010 Max. :2.0000
## alcohol quality
## Min. : 8.40 Min. :3.000
## 1st Qu.: 9.50 1st Qu.:5.000
## Median :10.20 Median :6.000
## Mean :10.42 Mean :5.636
## 3rd Qu.:11.10 3rd Qu.:6.000
## Max. :14.90 Max. :8.000
This data set contains approximately 2,000 loans with 13 variables on each.
The level quality of wine in datasetis is normally distributed. The number of red wine with quality score 5 and 6 are 640 and 620 respectively.
The level of volatile acidity is right skewed. The median is around 0.45
The level of alcohol is fair distributed but slightly right skewed. The median is around 10.20.
The scale of pH is normal distributed and most wines are between 3-4 on the pH scale. The median is about 3.310.
The residual sugar is strongly right skewed. It is important to mention that, there is a large amount of outliers. The median is 2.200.
After we calculate the upper fence which is 3.65, the refined plot provides a better boxplot. However, the distribution plot raises some questions for me, why is there a gap at 2.7? Are there really no wine with that level of sugar?
Similar to the last plot, there are a great amount of outliers in chlorides. In addition, the distribution is strongly right skewed. The median is 0.07900.
After we calculate the upper fence and lower fence which is 0.13 and 0.04 respectively, the refined plot provides a better boxplot. However, the distribution plot raises some questions for me, why is there a gap at 2.7? Are there really no wine with that level of sugar?
There are about 1600 observations in the dataset with 11 features (fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, alcohol).
-The quality of wines are normally distributed. -The distribution of most features in data is right skewed. -The maximum quality scale is 8.0 and the median scale is 6.0.
The main features I am interested in the data set are which chemical properties influence the quality of red wines。 I would like to determine which features would have a mainly affect on predicting the quality of wine.
Citric acid and alcohol. But I suspect that the level of volatile acidity plays a main role in affecting the quality of wine.
I also might provide a refined data that without outliers in chemical properties such as sugar and chlorides.
Nope
## List of 1
## $ axis.text: list()
## ..- attr(*, "class")= chr [1:2] "element_blank" "element"
## - attr(*, "class")= chr [1:2] "theme" "gg"
## - attr(*, "complete")= logi FALSE
## - attr(*, "validate")= logi TRUE
## [1] "volatile.acidity" "alcohol" "citric.acid"
## [4] "density" "pH" "residual.sugar"
## [7] "quality"
The correlation between level of alcohol and wine’s quality is 0.476. The median of the level of alcohol is increasing as the quality improved, from 9 to 12. The graph indicates a positive relationship between level of alcohol and wine’s quality, especially for the wine with quality from level 5 to 6.
The correlation between level of alcohol and wine’s quality is -0.4. The median of volatile acidity is decreasing as the red wine’s quality improved. It is from 0.8 down to about 0.4. The graph indicates a negative relationship between level of volatile acidity and wine’s quality.
The plot provides a negative trend (-0.5) between the level of alcohol and density.
The plot provides a positive trend (0.35) between the level of alcohol and density.
From the plot above, alcohol has the strongest relationship with quality. While volatile acidity has a negative correlation with quality, but the correlation is negative.
Although density has a weak correlation with quality, the plot indicates a strong correlation with alcholo and residual sugar. Not surprisingly, there is a positive relationship between alcohol and density, and a negative relationship between residual sugar and density.
The negative relationship between volatile acidity and citric acid.
Confront with chemical properties like alcohol and volatile acidity, the influence of these chemical properties on quality of red wine is not significant. The regression lines in plot above show that only red wine with quality 3 and 8 are evident. It is important to mention that, there are few number of red wine with quality 3 and 8, compare with red wine with quality 5 and 6.
Conversely, given the plot above, the influence of alcohol on quality is significant, given the constant level of surgar. It is obvious that the influence is significant to all quality levels.
While, if we control the level of surgar, the influence of volatile acidity on quality is not as significant as the alcohol.
While if I combine the two chemical properties like alcohol and volatile acidity their influence on level of red wine quality is weakened. It seems the influence only works form extremly good wien and extremly bad wine.
If I control the level of surgar, the influence of alcohol and volatile acidity on quality is as same as last section suggested. While if I put alcohol and volatile acidity together, the influence is not as significant as last section showed.
The plot shows a there are a great amount outliers of sugar. However, it turns out that there are not important to the influence to quality, especially when confront with other chemical property such as alcohol.
The plot shows a negative influence of volatile acidity on quality. Similarly to last plot, it seems like the great amount of outliers does not affect alcohol’s influence.
I would like to see how the a large amount of outliers will affect the influence. From the graph above, with Alcohol and Volatile Acidity, it is not clear that how those two chemical properties will influence the red wine that scored from 4 to 7.
This data contains approximately 2,000 loans with 13 variables on each. My main question is which chemical property would influence the quality of red wine, and how actually the correlation is.
My analysis suggests that both alcohol and Volatile Acidity play an important role in influencing the quality, but in different direction. The influence of alcohol is strong if the level of surgar is controlled, while this argument does not hold for volatile acidity. Inaddition, we need to be aware of the large amount of outliers in chemical properties such lik sugar, alcohol and volatile acidity, especially for red wine scored 4, 5 and 6. A multivariate plot shows that confront with Volatile Acidity and alcohol, the influence is only significant for with quality 3 and 8.
In order to do further research, the data need to provide more observations, and a more advanced regression is desired.
The struggle of my analysis is that the correlation between the given chemical properties and quality of red wine. It looks like easy, single regression might deal with that, but it becomes more complicate than I originally thought when I combine more chemical properties. It also looks trick if I try to apply multivariable regression, since several of contral variables are linearly relate to others.
Data visulization makes my life easier, based on some plots, I figure out, there are plenty outliers among those chemical properties. So I wonder if this result might shed some lights on my analysis. The result show that the influence of alcohol is robust especially when they confront with sugar, while volatile acidity is not as robust as alcohol is. In addition, they contribute to the wine with great quality. So I am wondering that if their combined chemical reaction is a threshold for making great wine. Secondly, I can, of course, get rid of the outliers. But, due to the limitation of number of observation in dataset. And this raises a question that there is a gap in the red wine when I control the number of outliers, I am not able to deal with it at the moment.I would suggest that more and more observations is desired.